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ABSTRACT 

Motivation: Base-calling of sequencing data produced by high- 
throughput sequencing platforms is a fundamental process in current 
bioinformatics analysis. However, existing third-party probabilistic or 
machine-learning methods that significantly improve the accuracy of 
base-calls on these platforms are impractical for production use due 
to their computational inefficiency. 

Results: We directly formulate base-calling as a blind deconvolution 
problem and implemented BlindCall as an efficient solver to this in- 
verse problem. BlindCall produced base-calls at accuracy comparable 
to state-of-the-art probabilistic methods while processing data at 
rates 10 times faster in most cases. The computational complexity 
of BlindCall scales linearly with read length making it better suited 
for new long-read sequencing technologies. 

Availability and Implementation: BlindCall is implemented as a set of 
Matlab scripts available for download at http://cbcb.umd.edu/-hcor 
rada/secgen. 

Contact: hcorrada@umiacs.umd.edu 
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1 INTRODUCTION 

Second-generation sequencing technology has revolutionized 
high-throughput genomics in life science and clinical research. 
The sheer scale of sequence generated by these instruments has 
allowed unprecedented views into a number of molecular phe- 
nomena, including population genetics, transcriptomics, epigen- 
etics and translational profiling. Both the throughput and 
accuracy of second-generation sequencing instruments has 
increased at an accelerated pace in the last few years due to 
the use of high-resolution optics and biochemical methods that 
allow sequencing of billions of DNA fragments in parallel by 
generating fluorescence intensity signals that can be decoded 
into DNA sequences. However due to experimental and hard- 
ware limitations, these raw signals are inherently noisy (Aird 
et aL, 2011; Bravo and Irizarry, 2010; Dohm et aL, 2008; 
Erlich et aL, 2008). Base-calling is the essential step of converting 
these noisy fluorescent intensity signals into sequences used in 
downstream analysis. Providing accurate base-calls greatly re- 
duces many difficulties in downstream bioinformatics analysis 
like genome assembly and variant calling (Alkan et aL, 2011; 
Bravo and Irizarry, 2010). 
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Sequencing- by-synthesis (Bentley et aL, 2008) generates mil- 
lions of reads of short DNA sequences by measuring in parallel 
the fluorescence intensity of billions of PCR-amplified and 
labeled clusters of DNA from a sample of interest. The 
DNA fragments attach to a glass surface where it is then PCR- 
amplified in situ to create a cluster of DNA fragments with iden- 
tical nucleotide composition. Sequence reads are generated from 
these DNA clusters in parallel and by cycles. A single nucleotide 
is sequenced from all DNA clusters in parallel by adding labeled 
nucleotides that incorporate to their complementary nucleotide. 
This synthesizes DNA fragments complementary to the frag- 
ments in each cluster as sequencing progresses. A set of four 
images is created measuring the fluorescence intensity along 
four channels to detect incorporation at each cycle. These 
images are then processed to produce fluorescence-intensity 
measurements from which sequences are then inferred by base- 
calling. In the default base-calling process for Illumina sequen- 
cers, called Bustard, the highest intensity in each quadruplet 
of intensity measurements determines the base at the correspond- 
ing position of the corresponding read. For current Illumina 
technologies, sequencers can produce up to 600 GB per run 
(Illumina, 2013). 

The raw intensity signals generated by this process are known 
to be subject to several biases (Aird et aL, 2011; Bravo and 
Irizarry, 2010; Dohm et aL, 2008; Erlich et aL, 2008) (Fig. 1A 
and B). (i) Cross talk: there are significant correlations between 
different nucleotide channels; (ii) phasing/pre-phasing: the signal 
in one cycle can spread to the cycles ahead and the cycles after it; 
(iii) signal decay: where signal intensities become lower in later 
sequencing cycles; (iv) background noise: the signal to noise ratio 
becomes lower in later sequencing cycles. A significant challenge 
in base-calling is accounting for these biases. 

Existing base-calling methods can be classified into two major 
groups: (i) unsupervised model-based methods that capture the 
sequencing-by- synthesis process in a statistical model of fluores- 
cence intensity from which base-call probabilities can be 
extracted directly (Bravo and Irizarry, 2010; Kao and Song, 
2011; Kao et aL, 2009; Massingham and Goldman, 2012) and 
(ii) supervised methods that train a statistical model on a set of 
base-calls whereby fluorescence intensity measurements are clas- 
sified into base-calls (Erlich et aL, 2008; Kircher et aL, 2009). The 
former methods have been shown to significantly improve the 
accuracy of Bustard base-calls. These model-based methods aim 
to capture the sequencing process described above in a statistical 
model from which base-call probabilities are usually obtained. 
While these probabilistic or machine-learning methods improve 
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Flow order: CGT CGT CGT CGT CGT... 

Fig. 1. Signal properties in the base-calling problem. (A) Fluorescence intensity measurements from one cluster for 50 sequencing cycles. Cross-talk and 
signal decay effects are clearly observed in this data. Background intensity increases as sequencing progresses. (B) The phasing effect demonstrated on a 
subset of data from (A). High intensity in the C channel in cycle 32 affects background intensity in the C channel in neighboring cycles 



the accuracy of base-calls, they are impractical for use due to 
their computational inefficiency, which usually scales quadrati- 
cally with read length since most of them resort to dynamic 
programming for model fitting (Kao and Song, 2011; Kao 
et ai, 2009; Massingham and Goldman, 2012). 

In this article, we show that the base-calling problem can be 
formulated as an optimization problem called blind deconvolu- 
tion. Based on this observation, we developed BlindCall as a 
method that treats base-calling as a blind deconvolution problem 
(Levin et ai, 2011; Xu et ai, 2013). We model intensity signals 
(B) output by the sequencer as the convolution of a latent sparse 
signal of interest X and a convolution kernel k modeling cross- 
talk and phasing biases, plus background noise N: 

B = k*X+N. 

The blind deconvolution problem is to recover the latent signal X 
given only the observed B. This reduces the base-calling problem 
into solving an inverse problem that admits computationally 
efficient solutions. The blind deconvolution problem has been 
a research hotspot in recent years (Levin et ai, 2011; Xu et aL, 
2013) and we adapt methods for its solution to the base-calling 
problem (Wang and Yin, 2010). 

BlindCall was able to provide base-calls at comparable accur- 
acy to state-of-the-art probabilistic methods while processing 
data at rates ten times or faster in most cases. It scales linearly 
with read length and is thus better suited for new long-read 
sequencing technologies. Direct blind deconvolution modeling 
and the ultra-efficient processing based on optimization methods 
presented here are essential for bioinformatics analysis work- 
flows to cope with increased throughput and read lengths in 
new sequencing technologies. 



2 METHODS 

BlindCall follows the following architecture (Fig. 2A): a training module 
uses blind deconvolution (Fig. 2B) on a randomly sampled subset (e.g. 
1000 reads) of the intensity data to iteratively estimate the convolution 
kernel k and produce a deconvolved signal from which base-calling is 
performed. The base-calling module then uses the convolution kernel 
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Fig. 2. The BlindCall architecture. BlindCall consists of two modules: (A) 
the training module uses blind deconvolution and (B) to simultaneously 
estimate model parameters and produce a deconvolved signal from which 
base-calling is done. The calling module uses the parameters estimated in 
the training module to produce a deconvolved output signal 



estimated in the training module to produce a deconvolved output 
signal for the entire dataset and call bases. 



2.1 Blind deconvolution 

We solve the blind deconvolution problem using an iterative procedure: 
(i) fixing k and estimating latent signal X using a specific non-blind de- 
convolution method based on iterative support detection (ISD) (described 
below) and then (ii) fixing X to estimate convolution kernel k to cor- 
rect for cross-talk and phasing effects. We divide the signal into non- 
overlapping windows: in each 20-cycle window we assume an invariant 
convolution kernel. The discrete convolution can be written as matrix 
multiplication B = KX, where K is a convolution matrix constructed 
from the kernel k. A normalization procedure is used in each iteration 
to account for intensity biases across channels. 
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2.2 Channel intensity normalization 

Intensity data for Illumina sequencing show certain biases, specifically 
(i) signal strength variation across channels, (ii) signal strength variation 
across clusters and (iii) signal decay over sequencing cycles. For accurate 
base-calling, these biases must be addressed through normalization. 
Traditionally, read normalization is applied to tackle the second and 
third problems first, in order to address the first problem. In our 
method, we circumvent the read normalization problem by analyzing 
the relative intensity ratio of successive calls across sequence reads. 

After an initial deconvolution in which cross-talk is corrected, we nor- 
malize each channel by scaling the intensities across reads by the same 
quantile (95%) in the respective channels and select the strongest channel 
after normalization as candidate base-calls. We then select successive 
candidate calls that are of different bases and construct a set of linear 
equations of the form jc^ — r^Xj k = 0, where x ik and xj k are the relative 
intensity of channels in the k-th relation and is the observed intensity 
ratio for the &-th relation. The set of linear equations is then Rx = 0, 
where R is a M x 6 matrix, with M being the total number of base-calls 
pairs within consideration. To estimate x, we solve a least-squares prob- 
lem under the constraint that ||x|| 2 = 1. The solution is obtained by sol- 
ving an eigenvalue problem since it can be formulated into the Rayleigh 
quotient min W 2 =1 ||i?x|| 2 , and its solution must satisfy the eigenvalue 
equation R 1 Rx = Ax. Since the number of base-calls across channels 
varies, the solution of this optimization problem favors channels that 
are called frequently. We normalize the problem using the number of 
base-calls and solve the generalized eigenvalue problem R l Rx = Wx 
where D is a diagonal matrix that records the number of base-calls in 
each channel. This formulation can be interpreted as finding the stable 
state of a normalized non-linear diffusion, and is used in normalized cut 
(Shi and Malik, 2000), Laplacian Eigenmaps (Belkin and Niyogi, 2001), 
and PageRank (Page et al., 1999). The estimated vector x is the relative 
intensity of each channel and we use it to normalize each channel in 
subsequent steps. 

2.3 Sparse signal reconstruction through ISD 

To perform base-calling we need to reconstruct latent sparse signal X, 
corresponding only to nucleotide incorporation measurements given a 
convolution kernel k. A straightforward I 2 optimization problem to es- 
timate latent signal X minimizes \\B — k * X\\ 2 . We know the latent signal 
is sparser than the observed signal, so we add this property as a constraint 
to the least squares problem and use an iterative procedure to solve the 
problem under the sparsity constraint. This idea is termed ISD in the 
mathematical community (Wang and Yin, 2010), and can also be applied 
to deconvolution problems stemming from image deblurring applica- 
tions. In our case, the support (non-zero entries) detected for latent 
signal X corresponds exactly to base-calls. Assuming ls upp is the signal 
taking non-zeros only in the support set obtained using our support 
detection algorithm, we want to find an X that minimizes 
\\B-k*X\\ 2 +X\X 

— ^Supp • This optimization outputs a corrected 
signal subject to the support set constraint. The support detection pro- 
cedure is critical to the output accuracy — if the support set is correct, we 
are close to our solution. At the beginning, we have no knowledge of the 
support set, since that directly tells us the answer. To tackle this, we use 
an increasing series {2i tr } that puts increasing weight on the second con- 
straint. This weight is low at first since the support set is not accurate. As 
we gradually refine the estimates we increase this weight. In our imple- 
mentation, support detection is conducted by incorporating the channel- 
normalization method discussed in the previous section and picking the 
strongest normalized channel. 

We provide further mathematical justification as to why this iterative 
procedure recovers the clear intensity signals of incorporation events. 
For reference to the applications in image deblurring we refer to the 



convoluted signal B as the blurred signal, and to the latent signal X, 
the clear signal. 

Observation 1: Assume the clear signal is a non-negative signal with 
spikes, the convolution (blur) kernel is non-negative and ||fc|| = 1, then 
the convoluted (blurred) signal is denser than the latent (clear) signal. 

This observation holds for all blurs since the blur spreads the spikes 
thus creates more non-zero intensities, so the support set becomes larger 
with the blurred signal. This observation hints us to design an optimiza- 
tion that favors sparse solutions: 

min\\B-k*X\\ 2 +M\X\\ p ,0<p< 1. 

The second term is a sparse-inducing penalty. This sparse regulariza- 
tion problem is well known in wavelet analysis (Mallat, 2009). We also 
have the following observation. 

Observation 2: By comparing the P norm (0 < p < 2) of the clear/ 
blurred signal, we discover that the sparse norm penalty favors the 
clear signal. 

As special cases: 

• l l norm measures the total variation of the signal, thus the blurred 
signal and clear signal have the same l l norm. 

• The I 2 norm of the blur signal is smaller than that of the clear signal. 

• The support set for the blurred signal is larger than the clear signal, 
therefore it has larger P cost. 

The above observations suggest that we use a sparse norm to penalize 
the blur signal and make it resemble the clear signal. Thus, we analyze the 
deconvolution model with an P penalty: 

mm\\B-k*X\\ 2 +a\\Xf. 

By introducing an auxiliary variable and using an exterior penalty 
technique, the above minimization problem is equivalent to solving the 
following optimization problem: 

min||5 -k* X|| 2 +a||w||°+A|| w - X\\ 2 , X -> +oo. 

One strategy to solve the above optimization is the alternating mini- 
mization technique (Wang et al., 2008) and cast the problem into two 
sub-problems: (i) fixing X and analyzing the terms containing w, we have 
the w sub-problem: 

min||w-X|| 2 +f||w||°. 

The solution can be found by entry- wise comparison (Mallat, 2009; Xu 
et al., 2013) and the result is the so-called hard thresholding: 

U,if|x,.|>yi. 

[ 0, otherwise 

Then (ii) fix w, and analyze the terms containing X, we have 
min\\B-k*X\\ 2 +M\w-X\\ 2 . 

This optimization problem has the same form with our deconvolution 
model when w = X Su p P . In our ISD method, Isupp is obtained by adaptive 
hard thresholding, where a is set adaptively to select strictly one non-zero 
element into the support set by selecting the channel with maximum inten- 
sity. Thus, our ISD method solves an optimization problem with an P 
penalty favoring sparse signals corresponding to nucleotide incorporation. 

2.4 Convolution kernel estimation 

Given latent signal X we use a least-squares method to estimate the con- 
volution kernel k modeling cross-talk and phasing effects by solving: 

mm\\B-k*X\\ 2 . 
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Table 1. Base callers accuracy and runtime comparison 



Bustard 



AYB 



BlindCall slow 



BlindCall fast 



freelbis 



Perfect reads 
Error rate (%) 
Time (minimum) 

Assembly results 
5x 
lOx 
20 x 



1446079 



0.29 
17 

N50 Maximum 

610 1122 

3 375 3469 

4466 4478 



1 532000 

0.21 
217 

N50 Maximum 
628 1155 
3198 3322 
4627 4637 



1 509451 

0.23 
8/12 

N50 
629 
3382 
4511 



1 508 779 



Maximum 
1164 
3487 
4523 



0.23 

4/8 

N50 Maximum 

623 1167 

3389 3485 

4470 4483 



1 530099 

0.21 

9/126 

N50 Maximum 

649 1184 

3306 3418 

4333 4357 



AYB, accuracy and run times for Bustard, freelbis and BlindCall for a dataset of 1.9 million reads from a HiSeq 2000 run of PhiX174. BlindCall Fast corresponds to non- 
iterative version of the blind-deconvolution method. Running times for BlindCall are reported as (processing time/total time), where the total time includes reading intensity 
data from disk and writing base-calls to disk. For freelbis, we report the time as (predicting time with single thread/ training time with 10 threads). BlindCall was able to 
produce base-calls of comparable accuracy to AYB and freelbis at significantly faster computational time (8min/12min versus 217 min and 126 min, respectively). It is also 
faster than Bustard (8min/12min versus 17 min). AYB, freelbis and BlindCall all improve on Bustard base calls. We also compared assemblies of the PhiX174 genome using 
reads generated by Bustard, BlindCall, freelbis and AYB. The reported N50s and Max contig lengths are averages >100 random samples with the corresponding coverage (5x , 
lOx or 20x). While BlindCall is able to process data at a significantly lower computational cost, the assemblies obtained using BlindCall are of comparable quality to those 
obtained using AYB or freelbis. 



We estimate convolution kernel k in two distinct steps: we use data from 
the first four cycles and only model cross-talk in the convolution kernel 
and use the blind-deconvolution iterative procedure to estimate cross-talk 
effects. We then fix the components of the convolution kernel corres- 
ponding to cross-talk effects for the remaining windows and estimate 
the components of the convolution kernel corresponding to phasing 
effects only. We assume the phasing effect is the same across channels. 



the complete set of reads at 5x, lOx and 20x coverage, and perform 
assemblies on each of these. We report N50 and maximum contig length 
for each resulting assembly. 

Version 1.9.4 of the Off-line basecaller was downloaded from Illumina 
to run Bustard. Version 2 of AYB was downloaded from http://www.ebi. 
ac.uk/goldman-srv/AYB. We ran AYB for 5 iterations as per its default 
setting. 



2.5 Deriving quality scores from deconvolved signal 

We measure the quality of a base-call by the ratio of the intensity of the 
strongest channel and the sum of the two strongest channels after the 
deconvolution procedure. This number ranges between 0.5 and 1.0 and is 
used as the raw quality score. This scheme is similar to the one in 
Illumina's Bustard basecaller. Like most existing base-callers, we calibrate 
these raw quality scores by aligning reads to the reference genome and 
mapping raw quality scores to the alignment error rate. 

2.6 Validation methods 

The following datasets were used to test the accuracy and computational 
efficiency of BlindCall and state-of-the-art probabilistic methods: 

Illumina HiSeq 2000 phiX174\ 1926928 single-end reads of 101 cycles 
from a single tile. Data was sequenced at the University of Maryland, 
College Park and is available for download at http://cbcb.umd.edu/ 
~hcorrada/secgen. 

Ibis Test: 200K single-end reads of phiX174 >51 sequencing cycles. 
Bordetella pertussis: 100 tiles of 76-cycle single-end reads from the cocco- 
bacillus B.pertussis, using the complete genome of the Tohoma I strain as 
a reference. 

A YB phiXl 74: released with AYB and contains human sequence with a 
PhiX174 spike-in. 

The last three datasets were downloaded from the AYB authors' website 
(h ttp : / /www . ebi . ac. uk/g oldman -sr v / A YB/#dat a) . 

To calculate accuracy we align the reads based on the phiX174 refer- 
ence using Bowtie2 (Langmead and Salzberg, 2012) with -end-to-end 
and -sensitive settings. Reported error rates are based on reads with 
no more than five substitution errors, following the methodology 
in Massingham and Goldman (2012). We used SparseAssembler 
(Ye et al., 2012) to obtain assemblies from base-calls obtained by each 
method. To derive assembly statistics, we sub-sampled 100 datasets from 



3 RESULTS 

BlindCall is implemented as a set of Matlab scripts available 
at http://cbcb.umd.edu/~hcorrada/secgen. As an example of 
its computational efficiency, running BlindCall on a single-core 
Matlab instance on an Intel i7 3610QM laptop with 2.3-3.3 GHz 
processor and 8 GB of memory, we found that it was able to pro- 
cess 1 million bases/s, or >85 billion bases/CPU day. We note 
that a significant portion of its running time (50%) is spent on 
disk IO to read intensity data and write the fasta/fastq outputs. 
To the best of our knowledge, BlindCall is one of the fastest 
base-callers available at this time, even though it is implemented 
in a scripting language. A port of this algorithm into a lower- 
level language (C/C++) will give further improvements on speed 
over the current Matlab version. 

We compared the running time of BlindCall to the state-of-the 
art probabilistic base-caller AYB (Massingham and Goldman, 
2012) and the state-of-the-art supervised learning method 
freelbis (Renaud et al., 2013) on a dataset of 1.9 million reads 
from a PhiX174 run on an Illumina HiSeq 2000 (Table 1). We 
found that BlindCall was able to process this dataset ~20 times 
faster than AYB and 10 times faster than freelbis while retaining 
similar accuracy. A plot of per cycle error rate of these base- 
callers (Fig. 3) shows that all methods produce significant 
improvements over Bustard, especially in later sequencing 
cycles. We observed a similar pattern when testing other datasets 
(Table 2). 

We also obtained better assemblies, especially at low cover- 
age, using BlindCall, AYB and freelbis relative to Bustard 
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base-calls (Table 1). We also found that the calibrated quality 
values obtained from BlindCall are very accurate (Fig. 4). 

We next compared each base-calling method's ability to scale 
to longer read lengths by calculating running time as a function 
of read length for the same dataset (Fig. 5). Like most probabil- 
istic model-based base callers, AYB resorts to a dynamic pro- 
gramming strategy with quadratic running time complexity with 
respect to the read length. In contrast, BlindCall scales linearly 
with read length, freelbis uses supervised learning approach, and 
while it also scales linearly with read length, its training time is 
much slower than BlindCall (even using 10 threads for freelbis, 
compared to a single thread for BlindCall). Base-callers based on 
the blind deconvolution framework will be able to scale as se- 
quencers produce longer reads. 



4 CONCLUSION 

BlindCall is a simple and ultra-fast non-probabilistic base-calling 
method for Illumina high-throughput sequencing data based on 
blind deconvolution. We have shown that it provides comparable 




Cycle 



Fig. 3. Third-party base callers improve Bustard per-cycle error rate. We 
plot error rate of each base-caller per sequencing cycle on the PhiX174 
test data. All three base callers significantly improve accuracy over 
Bustard, especially in later cycles. BlindCall is able to achieve comparable 
accuracy while processing data at a much faster rate 
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Predicted Quality Scone 

Fig. 4. BlindCall produces accurate calibrated quality scores. We plot 
observed error rates (on the PHRED scale) for Bustard, AYB and 
BlindCall as predicted by quality scores and observed high correlation 
for all base callers 
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Fig. 5. Base-calling by blind deconvolution is scalable to long read 
lengths. We compare the computational time of BlindCall with a state- 
of-the-art probabilistic base caller AYB, the state-of-the-art supervised 
learning method freelbis and Illumina' s Bustard on the PhiX174 dataset 
reported in Table 1 as a function of the number of sequencing cycles. Since 
most model-based base callers resort to a dynamic programming solution, 
running time is quadratic with respect to the read length. In contrast, 
BlindCall scales linearly with read length. Base callers based on the 
blind deconvolution framework will be able to scale as sequencers produce 
longer reads, freelbis also scales linearly but is much slower than BlindCall 



Table 2. Accuracy comparison 



Ibis Test B. pertussis PhiX174 (AYB) 



Perfect reads Error rate (%) Perfect reads Error rate (%) Perfect reads Error rate (%) 



Bustard 


99 834 


1.45 


1 557 963 


2.01 


24478 


0.49 


AYB 


133 537 


0.73 


2 304005 


1.26 


26 878 


0.38 


BlindCall slow 


110951 


1.12 


1902 621 


1.61 


25144 


0.45 


BlindCall fast 


105312 


1.26 


1 856286 


1.66 


24740 


0.47 


Time Slow 


0.08/0.3/1 




0.11/6/10 




0.15/14/22 




Fast 


0.08/0.1/1 




0.11/3/8 




0.15/7/16 





Accuracy for Bustard, AYB and BlindCall on various datasets. BlindCall was able to produce comparable accuracy to state-of-the-art base callers at significantly faster 
computational time. All methods improve on Bustard base calls. Run times for BlindCall are reported as (training time/processing time/total time in minutes) where the total 
time includes reading intensity data from disk and writing base-calls to disk. 
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accuracy to probabilistic base-calling methods while producing 
base-calls at rates more than ten times faster. 

Almost all probabilistic methods solve the base-calling prob- 
lem in a 'forward' way, i.e. by setting a set of basis functions and 
searching for an optimal path, which often leads to dynamic 
programming solutions. Fitting these statistical methods is com- 
putationally expensive, and will not scale as the increase in 
sequencing throughput continues. Also, a stationarity assump- 
tion must be made in order to estimate parameters in these prob- 
abilistic methods through a Markov process. In contrast, 
BlindCall models base-calling as an 'inverse' problem of blind 
deconvolution, which requires no probabilistic assumptions of 
the sequencing process. 

As steady progress has been made to improve the accuracy of 
probabilistic methods, we expect that similar progress will be 
made on non-probabilistic methods based on the blind deconvo- 
lution methods described in this article. Furthermore, these 
methods will be better suited to cope with increased throughput 
and read lengths of new sequencing technologies. 
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